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Abstract 

Call graphs depict the static, caller-callee relation be- 
tween "functions" in a program. With most source/target 
languages supporting functions as the primitive unit of com- 
position, call graphs naturally form the fundamental control 
flow representation available to understand/develop soft- 
ware. They are also the substrate on which various inter- 
procedural analyses are performed and are integral part 
of program comprehension/testing. Given their universality 
and usefulness, it is imperative to ask if call graphs exhibit 
any intrinsic graph theoretic features - across versions, pro- 
gram domains and source languages. This work is an at- 
tempt to answer these questions: we present and investigate 
a set of meaningful graph measures that help us understand 
call graphs better; we establish how these measures cor- 
relate, if any, across different languages and program do- 
mains; we also assess the overall, language independent 
software quality by suitably interpreting these measures. 



1 Introduction 

Complexity is one of the most pertinent characteristics 
of computer programs and, thanks to Moore's law, com- 
puter programs are becoming ever larger and complex; it's 
not atypical for a software product to contain hundreds of 
thousands, even millions of lines of code where individual 
components interact in myriad of ways. In order to tackle 
such complexity, variety of code organizing motifs were 
proposed. Of these motifs, functions form the most fun- 
damental unit of source code: software is organized as set 
of functions - of varying granularity and utility, with func- 
tions computing various results on their arguments. Critical 
feature of this organizing principle is that functions them- 
selves can call other functions. This naturally leads to the 
notion of function call graph where individual functions are 
nodes, with edges representing caller-callee relations; in- 

*In reverence to the Wizard Book. 



degree depicts the number of functions that could call the 
function and outdegree depicts the number of functions that 
this function can call. Since no further restrictions are em- 
ployed, the caller-callee relation induces a generic graph 
structure, possibly with loops and cycles. 

In this work we study the topology of such (static) call 
graphs. Our present understanding of call graphs is limited; 
we know: that call graphs are directed and sparse; can have 
cycles and often do; are not strongly connected; evolve over 
time and could exhibit preferential attachment of nodes and 
edges. Apart from these basic understanding, we do not 
know much about the topology of call graphs. 

2 Contributions 

In this paper we answer questions pertaining to topologi- 
cal properties of call graphs by studying a representative set 
of open-source programs. In particular, we ask following 
questions: What is the structure of call graphs? Are there 
any consistent properties? Are some properties inherent to 
certain programming language/problem class? In order to 
answer these questions, we investigate set of meaningful 
metrics from plethora of graph properties 0. Our specific 
contributions are: 

1) We motivate and provide insights as to why certain 
call graph properties are useful and how they could help us 
develop better and robust software. 2) We compare graph 
structure induced by different language paradigms under an 
eventual but structurally immediate structure - call graphs. 
The authors are unaware of any study that systematically 
compare the call graphs of different languages; in particular, 
the "call graph" structure of functional languages. 3) Our 
corpus, being varied and large, is far more statistically rep- 
resentative compared to the similar studies ([24], |4|,[18|). 
4) We, apart from confirming previous results in a rigorous 
manner, also compute new metrics to capture finer aspects 
of graph structure. 5) As a side effect, we provide a po- 
tential means to assess software quality, independent of the 
source language. 



Rest of the paper is organized as follows. We begin by 
justifying the utility of our study and proceed to introduce 
relevant structural measures in section [4] Section [5] dis- 
cusses the corpus and methodology. We then present our 
the measurements and interpretations (Section|6]). We con- 
clude with section |7] and |8] 

3 Motivation 

Call graphs define the set of permissible interactions and 
information flows and could influence software processes in 
non trivial ways. In order to give the reader an intuitive un- 
derstanding as to how graph topology could influence soft- 
ware processes, we present following four scenarios where 
it does. 

Bug Propagation Dynamics Consider how a bug in 
some function affects the rest of the software. Let f oo call 
bar and bar could return an incorrect value because of a 
bug in bar. if f oo is to incorporate this return value in 
its part of computation, it is likely to compute wrong an- 
swer as well; that is, bar has infected f oo. Note that such 
an infection is contagious and, in principle, bar can infect 
any arbitrary function /„ as long as /„ is connected to bar. 
Thus connectedness as graph property trivially translates to 
infectability. Indeed, with appropriate notions of infection 
propagation and immunization, one could understand bug 
expression as an epidemic process. It is well known that 
graph topology could influence the stationary distribution 
of this process. In particular, the critical infection rate - the 
infection rate beyond which an infection is not containable 
- is highly network specific; in fact, certain networks are 
known to have zero critical thresholds [5]. It pays to know 
if call graphs are instances of such graphs. 

Software Testing: Different functions contribute differ- 
ently to software stability. Certain functions that, when 
buggy, are likely to render the system unusable. Such func- 
tions, functions whose correctness is central to statistical 
correctness of the software, are traditionally characterized 
by per-function attributes like indegree and size. Such sim- 
ple measure(s), though useful, fail to capture the transi- 
tive dependencies that could render even a not-so-well con- 
nected function an Achilles heel. Having unambiguous met- 
rics that measure a node's importance helps making soft- 
ware testing more efficient. Centrality is such a measure 
that gives a node's importance in a graph. Once relevant 
centrality measures were assigned, one could expend rel- 
atively more time testing central functions. Or, equally, 
test central functions and their called contexts for preva- 
lent error modes like interface nonconformity, context dis- 
parity and the likes (EH, 0). By considering node cen- 
tralizes, one could bias the testing effort to achieve similar 
confidence levels without a costlier uniform/random testing 
schedule; though most developers intuitively know the im- 



portance of individual functions and devise elaborate test 
cases to stress these functions accordingly, we believe such 
an idiosyncratic methodology could be safely replaced by 
an informed and statistically tenable biasing based on cen- 
tralities. Centrality is also readily helpful in software impact 
analysis. 

Software Comprehension: Understanding call graph 
structure helps us to construct tools that assist the devel- 
opers in comprehending software better. For instance, con- 
sider a tool that magically extracts higher-level structures 
from program call graph by grouping related, lower-level 
functions. Such a tool, for example, when run on a kernel 
code base, would automatically decipher different logical 
subsystems, say, networking, filesystem, memory manage- 
ment or scheduling. Devising such a tool amounts to finding 
appropriate similarity metric(s) that partitions the graph so 
that nodes within a partition are "more" similar compared 
to nodes outside. Understandably, different notions of simi- 
larities entail different groupings. Recent studies show how 
network structure controls such grouping Q and how per 
node graph metrics can be used to improve the developer- 
perceived clustering validity ([26], 1171 ). 

Inter Procedural Analysis Call graph topology could 
influences both precision and convergence of Inter Procedu- 
ral Analysis (IPA). When specializing individual procedures 
in a program, procedures that have large indegree could end 
up being less optimal: dataflow facts for these functions 
tend to be too conservative as they are required to be con- 
sistent across a large number of call sites. By specifically 
cloning nodes with large indegree and by distributing the 
indegrees "appropriately" between these clones, one could 
specialize individual clones better. Also, number of itera- 
tions an iterative IPA takes compute a fixed-point depends 
on the maa;(longest path length, largest cycle). 

4 Statistical Properties of Interest 

As with most nascent sciences, graph topology litera- 
ture is strewn with notions that are overlapping, correlated 
and misused gratuitously; for clarity, we restrict ourselves 
to following structural notions. A note on usage: we em- 
ploy graphs and networks interchangeably; G = (V,E), 
| V |= n and | E |= m; (i, j) implies i calls j; di denotes 
the degree of vertex i and dij denotes the geodesic distance 
between i and j; N(i) denotes the immediate neighbours 
of i; graphs are directed and simple: for every ji) and 
(«2, 32) present, either (ii ^ i 2 ) or (ji ^ j 2 ) is true. 

Graphs, in general, could be modeled as random, small 
world, power-law, or scale rich, each permitting different 
dynamics. 

Random graphs: random graph model [ 1 1 j, is perhaps 
the simplest network model: undirected edges are added 
at random between a fixed number n of vertices to create 



a network in which each of the \n{n — 1) possible edges 
is independently present with some probability p, and the 
vertex degree distribution follows Poisson in the limit of 
large n. 

Small world graphs: exhibit high degree of clustering 
and have mean geodesic distance I - defined as, t~ l = 
n (n+i) Y^i^j d-ij 1 ~ in the range of l°g n ; mat is ' number 
of vertices within a distance r of a typical central vertex 
grows exponentially with r f\3l . 

It should be noted that a large number of networks, in- 
cluding random networks, have I in the range of log n or, 
even, log log n. In this work, we deem a network to be 
small world if I grows sub logarithmically and the network 
exhibits high clustering. 

Power law networks: These are networks whose degree 
distribution follow the discrete CDF: P[X > x] oc ex" 1 , 
where c is a fixed constant, and 7 is the scaling exponent. 
When plotted as a double logarithmic plot, this CDF ap- 
pears as a straight line of slope —7. The sole response of 
power-law distributions to conditioning is a change in scale: 
for large values of x, P[X > x\X > Xi] is identical to 
the (unconditional) distribution P[X > x\. This "scale in- 
variance" of power-law distributions is attributed as scale- 
freeness. Note that this notion of scale-freeness does not 
depict the fractal-like self similarity in every scale. 

Graphs with similar degree distributions differ widely 
in other structural aspects; rest of the definitions introduce 
metrics that permit finer classifications. 

degree correlations: In many real-world graphs, the 
probability of attachment to the target vertex depends 
also on the degree of the source vertex: many networks 
show assortative mixing on their degrees, that is, a pref- 
erence for high-degree nodes to attach to other high- 
degree node; others show disassortative mixing where 
high-degree nodes consistently attach to low-degree ones. 
Following measure, a variant of Pearson correlation co- 
efficient [20], gives the degree correlation. p = 

-1 1-2 7 ZT\ \ ~?x- ' where 3u h are the 

degrees of the vertices at the ends of i th edge, with i = 
1 • • • m. p takes values in the range — 1 < p < 1, with 
p > signifying assortativity and p < signifying dis- 
sortativity. p = when there is no discernible correlation 
between degrees of nodes that share an edge. 

scale free metric: a useful measure capturing the fractal 
nature of graphs is scale-free metric s(g) |[T6l , defined as: 
s (ff) = S(t j)eE didj > along with its normalized variant 



S(g) 
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is the maximal s(g) and is dictated by 



the type of network understudjQ Rest of the paper will use 
the normalized variant. 

s(g) is maximal when nodes with similar degree con- 

1 For unrestricted graphs, s max = ^" =1 (dj/2).d?. 



nect to each other lfT3ll : thus, S(g) is close to one for net- 
works that are fractal like, where the connectivity, at all de- 
grees, stays similar. On the other hand, in networks where 
nodes repeatedly connect to dissimilar nodes, S(g) is close 
to zero. Networks that exhibit power-law, but have have a 
scale free metric S(g) close to zero are called scale rich; 
power-law networks whose S(g) value is close to one are 
called scale-free. Measures S(g) and p are similar and are 
correlated; but they employ different normalizations and are 
useful in discerning different features lfT6ll . 

clustering coefficient: is a measure of how clustered, 
or locally structured, a graph is: it depicts how, on an aver- 
age, interconnected each node's neighbors are. Specifically, 
if node v has k v immediate neighbors, then the clustering 
coefficient for that node, C v , is the ratio of number of edges 
present between its neighbours E v to the total possible con- 
nections between v's neighbours, that is, k v (k v — l)/2. The 
whole graph clustering coefficient, C, is the average of C v s: 

thatis,C=(a)„ = ( s Jf- y )^. 

clustering profile: C has limited use when immedi- 
ate connectivity is sparse. In order to understand inter- 
connection profile of transitively connected neighbours, we 



where 



Note that, by this 



use clustering profile [1|: C% - — | { ,| d . =fc} | 

r d(A\ - \{ti,k)\3,k£N(i)\d jk £G(y\i)=d}\ 

definition, clustering coefficient C is simply C\, 

centrality: of a node is a measure of relative importance 
of the node within the graph; central nodes are both points 
of opportunities - that they can reach/influence most nodes 
in the graph, and of constraints - that any perturbation in 
them is likely to have greater impact in a graph. Many cen- 
trality measures exist and have been successfully used in 
many contexts ([6], |10|). Here we focus on betweenness 
centrality B u (of node it), defined as the ratio of number of 
geodesic paths that pass through the node (u) to that of the 
total number of geodesic paths: that is, B u = ' 
nodes that occur on many shortest paths between other ver- 
tices have higher betweenness than those that do not. 

connected components: size and number of connected 
components gives us the macroscopic connectivity of the 
graph. In particular, number and size of strongly connected 
components gives us the extent of mutual recursion present 
in the software. Number of weakly connected component 
gives us the upper bound on amount of runtime indirection 
resolutions possible. 

edge reciprocity: measures if the edges are reciprocal, 
that is, if € E, is also £ El A robust mea- 

sure for reciprocity is defined as jl2|: p — |5f where 
2 . . aijdji 

> = — and a is mean of values in adjacency matrix. 

This measure is absolute: p greater than zero imply larger 
reciprocity than random networks and p less than zero im- 



Figure 1. Indegree Distribution 



Figure 2. Outdegree Distribution 



ply smaller reciprocity than random networks. 

5 Corpora & Methodology 

We studied 35 open source projects. The projects are 
written in four languages: C, C++, OCaml and Haskel. 
Appendix [8] enlists these software, their source language, 
versiom, domain and size: number of nodes TV and the 
number of edges M. Most programs used are large, used 
by tens of thousands of users, written by hundreds of de- 
velopers and were developed over years. These programs 
are actively developed and supported. Most of these pro- 
grams - from proof assistant to media player, provide var- 
ied functionalities and have no apparent similarity or over- 
lap in usage/philosophy/developers; if any, they exhibit 
greater orthogonality: Emacs Vs Vim, OCaml Vs GCC, 
Postgres Vs Framerd, to name a few. Many are stand-alone 
programs while few, like glibc and ffmpeg, are provided 
as libraries. Some programs, like Linux and glibc, have 
machine-dependent components while others like yarrow 
and psilab are entirely architecture independent. 

In essence, our sample is unbiased towards applications, 
source languages, operating systems, program size, pro- 
gram features and developmental philosophy. The corpus 
versions and age vary widely: some are few years old while 
others, like gcc, Linux kernel and OCamlc, are more than a 
decade old. We believe that any invariant we find in such a 
varied collection is likely universal. 

We used a modified version of Code Viz E71 to extract 
call graphs from C/C++ sources. For OCaml and Haskell, 
we compiled the sources to binary and used this modified 
Code Viz to extract call graph from binaries. OCaml pro- 
grams were compiled using ocamlopt while for Haskell we 
used GHC. A note of caution: to handle Haskell's laziness, 
GHC uses indirect jumps. Our tool, presently, could handle 
such calls only marginally; we urge the reader to be mindful 
of measures that are easily perturbed by edge additions. 

We used custom developed graph analysis tools to mea- 
sure most of the properties; where possible we also used 
the graph-tool software H31 . We used the largest weakly 



connected components for our measurements. Component 
statistics were computed for the whole data set. 

6 Interpretation 

In the following section we walk through the results, dis- 
cuss what these results mean and why they are of interest to 
language and software communities. Note that most plots 
have estimated sample variance as the confidence indicator. 
Also, most graphs run a horizontal line that separates data 
from different languages. 

Degree Distribution: Fitting samples to a distribution 
is impossibly thorny: any sample is finite, but of the dis- 
tributions there are infinitely many. Despite the hardness of 
this problem, many of the previous results were based either 
on visual inspection of data or on linear regression, and are 
likely to be inaccurate . 

We use cumulative distribution to fit the data and we 
compute the likelihood measures for other distributions in 
order to improve the confidence using [8|. Figures [T] and 
[2] depict how four programs written in four different lan- 
guage paradigms compare; the indegree distribution permits 
power-law (2.3 < 7 ~< 2.9) while the outdegree distri- 
bution permits exponential distribution (Haskell results are 
coarse, but are valid). This observation, that in and out de- 
gree distributions differ consistently across languages, is ex- 
pected as indegree and outdegree are conditioned very dif- 
ferently during the developmental process. 

Outdegree has a strict budget; large, monolithic func- 
tions are difficult to read and reuse. Thus outdegree is min- 
imized on a local, immediate scale. On the other hand, 
large indegree is implicitly encouraged, up to a point; inde- 
gree selection, however, happens in a non-local scale, over 
a much larger time period; usually backward compatibil- 
ity permits lazy pruning/modifying of such nodes. Conse- 
quently one would expect the variability of outdegree - as 
depicted by the length of the errorbar, to be far less com- 
pared to that of the indegree. This is consistent with the 
observation (Fig. |3J. Note that the tail of the outdegree is 
prominent in OCaml and C++: languages that allow highly 



Figure 3. Average Degree 



Figure 4. Epidemic Threshold Vs N 



stylized call composition. 

Such observations are critical as distributions portend the 
accuracy of sample estimates. In particular, such distribu- 
tions as power-law that permits non-finite mean and vari- 
ance - consequently eluding central limit theorem, are very 
poor candidates for simple sampling based analyses; under- 
standing the degree distribution is of both empirical and the- 
oretical importance. 

Consider the bug propagation process delineated in Sec- 
tion [3] Assuming that the inter-node bug propagation is 
Markovian, we could construct an irreducible, aperiodic, 
finite state space Markov chain (not unlike |6|) with bug 
introduction rate (3 and debugging (immunization) rate S as 
parameters. Note that this Markov chain has two absorb- 
ing states: all-infected or all-cured. Equipped with these 
notions, we could ask what is the minimal critical infec- 
tion rate j3 c beyond which no amount of immunization will 
help to save the software; below (3 C the system exponen- 
tially converges to the good, all-cured absorbing state. It is 
known that for a sufficiently large power-law network with 
exponent in the range 2 < 7 < 3, f3 c is zero [5|. Thus 
one is tempted to conclude that, provided Markovian as- 
sumption holds, it is statistically impossible to construct an 
all-reliable program. However that would be inaccurate as 
the sum of indegree and outdegree distributiorj^] indegree 
and outdegree need not follow power-law. However a re- 
cent study [25] establishes that, for finite networks, (3 C is 
bounded by the spectral diameter of the graph; in partic- 
ular, f3 c = XT7' wnere lS m e largest eigenvalue of 
the adjacency matrix. Figure |4]depicts the relation between 
\\^A an d the graph size, n. For a "robust" software, we re- 
quire (3 C to be large, or equally, A 1.^4 to be small. However, 
it is evident from the plot that larger the graph, higher the 
\i.a- This trend is observed uniformly across languages. 
Thus, we are to conclude that large programs tend to be 
more fragile, confirming the established wisdom. Another 
equally important inference one can make from the inde- 
gree distribution is that uniform fault testing is bound to fail: 

2 Bug propagation is symmetric: f oo and bar can pass/return bugs to 
one another. 



should one is to build a statistically robust software, testing 
efforts ought to be heavily biased. These two inferences 
align closely with the common wisdom, except that these 
inferences are rigorously established (and party explained) 
using the statistical nature of call graphs. 

Scale Free Metric: Fig. [5] shows how scale-free metric 
for symmetrized call graphs vary with different programs. 
Two observations are critical: First, S(g) is close to zero. 
This implies call graphs are scale-rich and not scale-free. 
This is of importance because in a truly scale-free networks, 
epidemics are even harder to handle; hubs are connected to 
hubs and the Markov chain rapidly converged to the all- 
infected absorption state. In scale-rich networks, as hubs 
tend to connect to lesser nodes, the rate of convergence is 
less rapid. Second, S(g) appears to be language indepen- 
dent Both near zero and higher S(g)s appear in all lan- 
guages. Thus call graphs, though follow power-law for in- 
degree, are not fractal like in the self-similarity sense. 

Degree Correlation: Fig [7] show how input-input (i-i) 
and output-output (0-0) degrees correlate with each other. 
These sets are weakly assortative, signifying hierarchical 
organization. 

But finer picture evolves as far as languages are con- 
cerned. C programs appears to have very similar i-i and 
0-0 profiles with 0-0 correlation being smaller and compa- 
rable to i-i correlation. In addition, C's correlation measure 
is consistently less than that of other languages and is close 
to zero; thus, C programs exhibit as much i-i/0-0 correlation 
as that of a random graph of similar size. In other words, 
if f 00 calls bar, the number of calls bar makes is inde- 
pendent of the number of calls f 00 made; this implies less 
hierarchical program structure as one would like the level 
n functions to receive fewer calls compared to level n — 1 
functions. For instance, variance ( list ) is likely to re- 
ceive fewer calls compared to sum ( list ) ; we would also 
like level n functions to have higher outdegree compared 
to level n — 1 functions. Thus, in a highly hierarchical de- 
sign, i-i and 0-0 correlations would be mildly assortative, 
with i-i being more assortative. For C++, i-i and 0-0 dif- 

3 Except Haskell; but this could be an artifact of edge limited sample. 



Figure 5. Scale Free Metric 



Figure 7. Assortativity Coefficient 





Figure 6. Clustering Coefficient 



Figure 8. Harmonic Geodesic Mean 



fer and are not ordered consistently. OCaml and Haskell 
exhibit marked difference in correlations: as with C, the o- 
o correlation is close to zero; but, i-i correlation is orders 
of magnitude higher than o-o correlation. That is, OCaml 
forces nodes with "proportional" indegree to pair up. If f oo 
is has an indegree X, bar is likely to receive, say, 2X in- 
degree. One could interpret this result as a sign of stricter 
hierarchical organization in functional languages. 

Clustering Coefficient: Fig. [6] depicts how the call 
graph clustering coefficients compare to clustering coeffi- 
cients of random networks of same size. Computed clus- 
tering coefficients are orders of magnitude higher than their 
random counterpart signifying higher degree of clustering. 
Also, observe that I, as depicted is Fig. [8] is in the order of 
log n. Together these observations make call graphs decid- 
edly small world, irrespective of the source language. 

We also have observed that average clustering coefficient 
for nodes of particular degree, C(di) follows power-law. 
That is, the plot of di to C(di) follows the power-law with 
C(di) oc d^~: high degree nodes exhibit lesser cluster- 
ing and lower degree notes exhibit higher clustering. It is 
also observed that OCaml's fit for this power-law is the one 
that had least misfit. Though we need further samples to 
confirm it, we believe functional languages exhibit cleaner, 
non-interacting hierarchy compared to both procedural and 
OO languages. 

Component Statistics: Fig. [9] gives us the components 
statistics for the data set. It depicts the number of weakly 



connected components (#WCC), number of strongly con- 
nected components (#SCC), and fraction of nodes in the 
largest strongly connected component (%SCC). 

#WCC is lower in C and OCaml. For C++ and Haskell, 
#WCC is higher compared to rest of the sample. This is 
an indication of lazy call resolution, coinciding with the de- 
layed/lazy bindings encouraged by both the languages. The 
#SCC values are highest for OCaml. This observation, com- 
bined with reciprocity of OCaml programs, makes OCaml 
a language that encourages recursion at varying granularity. 
On the other end, C++ rates least against #SCC values. 

Another important aspect in Fig. [9] is the observed val- 
ues for %SCC; this fraction varies, surprisingly, from 1% to 
30% of total number of nodes. C leads the way with some 
applications, notably vim and Emacs, measuring as much as 
20 to 30% for %SCC. OCaml follows C with a moderate 2 
to 6% while C++ measures 1% to 3%. We do not yet know 
why one third of an application cluster to form a SCC. Also, 
%SCC values say that certain languages, notably OCaml, 
and programs domains (Editors: Vim and Emacs) exhibit 
significant mutual connectivity. 

Edge Reciprocity: Fig. 1 1 shows the plot of edge reci- 
procity for various programs. Edge reciprocity is a mea- 
sure of direct mutual recursion in the software. High reci- 
procity in a layered system implies layering inversion and 
we would, ideally, like a program to have negative reci- 
procity. 



Figure 9. Component Statistics 



Figure 11. Edge Reciprocity 




Figure 10. Betweenness - C.Linux 



Figure 12. Betweenness - C++.Coin 



Most programs exhibit close to zero reciprocity: most 
call graphs exhibit as much reciprocity as that of random 
graphs of comparable size. None exhibit negative reci- 
procity, implying no statistically significant preferential se- 
lection to not to violate strict layering. 

The software that had least reciprocity is the Linux ker- 
nel. Recursion of any kind is abhorred inside kernel as 
kernel-stack is a limited resource; besides, in a environment 
where multiple contexts/threads communicate using shared 
memory, mutual recursion could happen through continua- 
tion flow, not just as explicit control flow. Functional lan- 
guages like OCaml naturally show higher reciprocity. An- 
other curious observation is that compilers, both OCamlc 
and gcc, appear to have relatively higher reciprocity. This 
is the second instance where applications (Compilers: GCC 
and OCamlc) determining the graph property; this could be 
seen as a reflection of how compilers work: a great deal of 
the lexing, parsing and semantic algorithms that compilers 
are based on follow rich mutually recursive mathematical 
definitions. 



Clustering Profile: As we see in Fig 13 Clustering Pro- 



file indeed gives us a better insight. Y axis depicts the av- 
erage clustering coefficient for nodes, say i and j, that are 
connected by geodesic distance dij. In all the graphs ob- 
served, this average clustering increases up to dij=3 and 
falls rapidly as dij increases further. We measured cluster- 
ing profile for degrees one to ten and the clustering profile 
appears to be unimodal, reaching the maximum at dij=i, 



irrespective of language/program domain. It suggests that 
maximal clustering occurs between nodes that are separated 
exactly by five hops: clustering profile for a node u is mea- 
sured with u removed; so dij=3 is 5 hops in the original 
graph. However exciting we find this result to be, we cur- 
rently have no explanation for this phenomenon. 

Betweenness: Fig. [10] to [15] depict how betweenness 
centrality is distributed - in different programs, written in 
different different languages. Note that betweenness is not 
distributed uniformly: it follows a rapidly decaying expo- 
nential distribution. This confirms our observation that im- 
portance of functions is distributed non-uniformly. Thus, 
by concentrating test efforts in functions that have higher 
betweenness - functions that are central to most paths - we 
could test the software better, possibly with less effort. An 
interesting line of investigation is to measure the correlation 
between various centrality measures and actual per function 
bug density in a real-world software. 

7 Related Work 

Understanding graph structures originating from various 
fields is an active field of research with vast literature; there 
is a renewed enthusiasm in studying graph structure of soft- 
ware and many studies, alongside ours, report that software 
graphs exhibit small-world and power-law properties. 

Ifl8l studies the call graphs and reports that both indegree 
and outdegree distributions follow power-law distributions 



Figure 13. Clustering Profile for Neighbours reachable in k+2 hops 
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Figure 14. Betweenness - OCaml.Coq 

and the graph exhibits hierarchical clustering. But [23 1 sug- 
gests that indegree alone follows power-law while the out- 
degree admits exponential distribution. |23| also suggests a 
growing network model with copying, as proposed in [15], 
would consistently explain the observations. 

More recently, [4] studies the degree distributions of var- 
ious meaningful relationships in a Java software. Many re- 
lationships admit power-law distributions in their indegree 
and exponential distribution in their out-degree. [22 1 stud- 
ies the dynamic, points-to graph of objects in Java programs 
and found them to follow power-law. 

Note that most work, excepting [4|, do not rigorously 
compare the likelihood of other distributions to explain the 
same data. Power-law is notoriously difficult to fit and even 
if power-law is a genuine fit, it might not be the best fit [8 1. 

8 Conclusion & Future Work 

We have studied the structural properties of large soft- 
ware systems written in different languages, serving differ- 
ent purposes. We measured various finer aspects of these 
large systems in sufficient detail and have argued why such 
measures could be useful; we also depicted situations where 
such measurements are practically beneficial. We believe 
our study is a step towards understanding software as an 
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evolving graph system with distinct characteristics, a view- 
point we think is of importance in developing and maintain- 
ing large software systems. 

There is lot that needs to be done. First, we need to mea- 
sure the correlation between these precise quantities and 
the qualitative, rule of thumb understanding that develop- 
ers usually possess. This helps us making such qualitative, 
albeit useful, observations rigorous. Second, we need to 
verify our finding over a much larger set to improve the 
inference confidence. Finally, graphs are extremely useful 
objects that are analysed in a variety of ways, each expos- 
ing relevant features; of these variants, the authors find two 
fields very promising: topological and algebraic graph the- 
ories. In particular, studying call graphs using a variant of 
Atkin's A-Homotopy theory is likely to yield interesting re- 
sults [3[. Also, spectral methods applied to call graphs is an 
area that we think is worth investigating. 
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Appendix I 



if; 


Lang 


Name .Version 


Appln. Domain 


ivr 


1V1 


l 


C 


scheme. mit.7. 7.1 


Interpreter 


2512 


5610 


2 


C 


linux.2.6.12.rc2 


Kernel 


20165 


70010 


3 


c 


httpd.2.2.4 


Web Sever 


1396 


4014 


4 


c 


bind.9.4.1 


Name Server 


4534 


18874 


5 


c 


sendmail.8.12.8 


Mail Server 


783 


4064 


6 


c 


ffmpeg.2007.05 


Media Codecs 


4207 


11692 


7 


c 


glibc.2.3.6 


CLib 


4401 


13972 


8 


c 


MPlayer.l.Orcl 


Media Player 


7985 


21744 


9 


c 


gcc.4.0.0 


C Compiler 


10848 


48847 


10 


c 


fvwm.2.5.18 


Win Manager 


3312 


12052 


11 


c 


gimp. 2.3. 9 


Image Editor 


16021 


88473 


12 


c 


postgresql.8.2.3 


R-DBMS 


8517 


41189 


13 


c 


framerd. 2.6.1 


OO-DMBS 


3490 


17048 


14 


c 


emacs.21.4 


"Editor" 


3872 


13154 


15 


c 


vim70 


Editor 


4489 


18368 


16 


c 


sim.outorder.3.0 


^arch/ISA Sim 


442 


1089 


17 


c 


openssl.0.9.8e 


Crypto Lib 


7078 


21827 


18 


c 


gnuplot.4.2.2 


Graph Plotting 


2191 


7045 


19 


C++ 


cccc.3.1.4 


Code Metrics 


1654 


5627 


20 


C++ 


kcachegrind.0.4 


Cache Analyser 


2593 


8054 


21 


C++ 


cgal.3.3 


CompGeom Lib 


3151 


7690 


22 


C++ 


coin.2.4.6 


OpenGL 3D Lib 


12963 


51877 


23 


C++ 


doxygen.1.5.3 


Doc Generator 


11723 


31889 


24 


C++ 


knotes.3.3 


PIM 


2174 


4942 


25 


OCaml 


ocamlc.opt.3.09 


OCaml Compiler 


4397 


12732 


26 


OCaml 


coqtop.opt.8 


Theorem Prover 


16126 


51092 


27 


OCaml 


fftw.3.2alpha2 


FFT Computing 


585 


1011 


28 


OCaml 


glsurf.2.0 


OpenGL Surface 


4003 


9173 


29 


OCaml 


psilab.2.0 


Numeric Envirn 


1888 


4341 


30 


OCaml 


ott.0.10.11 


PL/Calculi Tool 


4300 


11193 


31 


Haskell 


yarrow. 1 .2 


Theorem Prover 


9397 


15199 


32 


Haskell 


Frown. 0.6.1 


Parser Gen 


6796 


10218 


33 


Haskell 


DrIFT.2.2.1 


Typed Preproc 


1428 


3292 


34 


Haskell 


HaXml.l. Validate 


XML Validate 


4117 


7624 


35 


Haskell 


HaXml.l.Xtract 


XML grep 


3909 


5242 



